10/29/2019

Agenda

  • Polynomial regression
  • Step functions
  • Regression splines
  • Smoothing splines
  • Generalized additive models

Recap

Linear B-splines

Cubic B-splines

Example of fit

\[\hat{f}(x) = \hat{\beta}_0 + \hat{\beta}_1 b_1(x)+ \hat{\beta}_2 b_2(x)+ \dots + \hat{\beta}_{K+3} b_{K+3}(x)\]

Example of fit

\[\hat{f}(x) = \hat{\beta}_0 + \hat{\beta}_1 b_1(x)+ \hat{\beta}_2 b_2(x)+ \dots + \hat{\beta}_{K+3} b_{K+3}(x)\]

Example of fit

\[\hat{f}(x) = \hat{\beta}_0 + \hat{\beta}_1 b_1(x)+ \hat{\beta}_2 b_2(x)+ \dots + \hat{\beta}_{K+3} b_{K+3}(x)\]

Example of fit

\[\hat{f}(x) = \hat{\beta}_0 + \hat{\beta}_1 b_1(x)+ \hat{\beta}_2 b_2(x)+ \dots + \hat{\beta}_{K+3} b_{K+3}(x)\]

Example of fit

\[\hat{f}(x) = \hat{\beta}_0 + \hat{\beta}_1 b_1(x)+ \hat{\beta}_2 b_2(x)+ \dots + \hat{\beta}_{K+3} b_{K+3}(x)\]

Example of fit

\[\hat{f}(x) = \hat{\beta}_0 + \hat{\beta}_1 b_1(x)+ \hat{\beta}_2 b_2(x)+ \dots + \hat{\beta}_{K+3} b_{K+3}(x)\]

Regression splines continued…

Natural splines

  • Splines can have high variance at the outer range of the predictors (especially when looking at the confidence bands)
  • A natural spline is a regression spline with additional boundary constraints: the function is required to be linear at the boundary (in the region where \(X\) is smaller than the smallest knot, or larger than the largest knot)
  • For cubic B-splines, this adds \(4 = 2 \times 2\) extra constraints, and allows us to put more internal knots for the same degrees of freedom as a regular cubic spline
  • In R, you just need to change bs to ns!

Natural splines

Choosing number and locations of the knots

  • One strategy is to decide \(K\), the number of internal knots, and then place them at appropriate quantiles of the observed \(X\)

  • A default choice is to add knots at the boundaries (total knots = \(K+2\))

  • Given \(K\) internal knots, there are \(K+1\) subintervals and \(d + K + 1\) degrees of freedom (\(\beta_0, \beta_1, \dots, \beta_{d+K}\))

  • A cubic spline with \(K\) internal knots has \(K + 4\) parameters or degrees of freedom

  • A natural spline with \(K\) internal knots has \(K\) degree of freedom

Fix a degree \(d\), and use cross-validation to choose the number of knots!

Splines for classification

Splines can also be used when the response variable is qualitative. For example, consider the logistic regression model

\[\log \left( \frac{p}{1-p} \right) = f(x) = \sum_{k=0}^{K + d} \beta_k b_k(x)\]

Once the basis functions have been defined, we just need to estimate coefficients \(\beta_k\) using a standard logistic regression procedure.

A smooth estimate of the conditional probability \(P(Y = 1 \mid x)\) can then be used for classification.

Linear logistic regression

Flexible logistic regression

Smoothing splines

Smoothing splines

Consider this criterion for fitting a smooth function \(g(x)\) to some data:

\[\text{argmin}_{g \in \mathbb{S}} \left\{ \sum_{i=1}^n (y_i - g(x_i))^2 + \lambda \int g^{\prime \prime} (t)^2 dt \right\}\]

  • The first term is the RSS, and tries to make \(g(x)\) match the data at each \(x_i\)
  • The second term is a roughness penalty and controls how wiggly \(g(x)\) via the tuning parameter \(\lambda \geq 0\):
    • The smaller \(\lambda\), the more wiggly the function, eventually interpolating \(y_i\) when \(\lambda = 0\)
    • As \(\lambda \rightarrow +\infty\), the function \(g(x)\) becomes linear

Why second derivative?

  • Derivative of a function: slope of tangent line at each point
  • Second derivative of a function: change in slope of tangent line at each point

Choosing \(\lambda\)

  • The solution is a natural cubic spline, with a knot at every unique value of \(x_i\). The penalty still controls the roughness via \(\lambda\)
  • As \(\lambda\) increases from \(0\) to \(+\infty\), the effective degrees of freedom \(\text{df}(\lambda)\) decrease from \(n\) to \(2\)
  • \(\lambda\) should be chosen via cross-validation
  • In R: smooth.spline(X, Y, df = 10)

Generalized Additive Models

Generalized Additive Models

Allows for flexible nonlinearities in several variables, but retains the additive structure of linear methods: we calculate a separate \(f_j\) for each \(X_j\), and then add together all of their contributions.

\[y_i = \beta_0 + f_1(x_{i1}) + f_2(x_{i2}) + \dots + f_p(x_{ip}) + \varepsilon_i\]

  • The non-linear fits can potentially make more accurate predictions for the response \(Y\)
  • Because the model is additive, we can still examine the effect of each \(X_j\) on \(Y\) individually while holding all of the other variables fixed
  • The main limitation of GAMs is that the model is restricted to be additive: with many variables, important interactions can be missed

GAM for regression

  • You can use smoothing splines, B-splines or natural splines. You can also mix terms - some linear, some nonlinear, e.g. 

gam(mpg ~ ns(horsepower, df = 5)+ns(acceleration, df = 5)+year)

  • Coefficients are not that interesting; fitted function values are
  • GAMs are additive, although low-order iterations can be included in a natural way using interactions of the form

gam(mpg ~ ns(horsepower, df = 5) : ns(acceleration, df = 5))

Generalized Additive Models

Holding acceleration and manufacturing year fixed, fuel efficiency tends to decrease with horsepower

GAM for classification

\[\log\left( \frac{p}{1-p} \right) = \beta_0 + f_1(x_{i1}) + f_2(x_{i2}) + \dots + f_p(x_{ip})\]

Question time